Large scale K-means clustering using GPUs
نویسندگان
چکیده
Abstract The k -means algorithm is widely used for clustering, compressing, and summarizing vector data. We present a fast memory-efficient GPU-based exact -means, Asynchronous Selective Batched K (ASB -means). Unlike most algorithms that require loading the whole dataset onto GPU amount of memory required to run our can be chosen much smaller than size dataset. Thus, cluster datasets whose exceeds available memory. works in batched fashion applies triangle inequality each iteration omit data point if its membership assignment, i.e., it belongs to, remains unchanged, thus significantly reducing number points need transferred between CPU’s RAM GPU’s global enabling very efficiently process large datasets. Our substantially faster implementation standard even situations when application feasible because fits into Experiments show ASB up 15x times also outperforms NVIDIA’s open-source RAPIDS machine learning library on all experiments.
منابع مشابه
A Large Scale Clustering Scheme for Kernel K-Means
Kernel functions can be viewed as a non-linear transformation that increases the separability of the input data by mapping them to a new high dimensional space. The incorporation of kernel function enables the K-Means algorithm to explore the inherent data pattern in the new space. However, the recent applications of kernel KMeans algorithm are confined to small corpora due to its expensive com...
متن کاملDistributed Kernel K-Means for Large Scale Clustering
Clustering samples according to an effective metric and/or vector space representation is a challenging unsupervised learning task with a wide spectrum of applications. Among several clustering algorithms, k-means and its kernelized version have still a wide audience because of their conceptual simplicity and efficacy. However, the systematic application of the kernelized version of k-means is ...
متن کاملk-means for fast and accurate large scale clustering
We propose k-means, a new clustering method which efficiently copes with large numbers of clusters and achieves low energy solutions. k-means builds upon the standard k-means (Lloyd’s algorithm) and combines a new strategy to accelerate the convergence with a new low time complexity divisive initialization. The accelerated convergence is achieved through only looking at kn nearest clusters and ...
متن کاملGenetic Weighted K-means for Large-Scale Clustering Problems
This paper proposes a genetic weighted K-means algorithm called GWKMA, which is a hybridization of a genetic algorithm (GA) and a weighted K-means algorithm (WKMA). GWKMA encodes each individual by a partitioning table which uniquely determines a clustering, and employs three genetic operators (selection, crossover, mutation) and a WKMA operator. The superiority of the GWKMA over the WKMA and o...
متن کاملA Parallel Implementation of K-Means Clustering on GPUs
Graphics Processing Units (GPU) have recently been the subject of attention in research as an efficient coprocessor for implementing many classes of highly parallel applications. The GPUs design is engineered for graphics applications, where many independent SIMD workloads are simultaneously dispatched to processing elements. While parallelism has been explored in the context of traditional CPU...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Data Mining and Knowledge Discovery
سال: 2022
ISSN: ['1573-756X', '1384-5810']
DOI: https://doi.org/10.1007/s10618-022-00869-6